An Abstraction Layer for SIMD Extensions

نویسندگان

  • Franz Franchetti
  • Christoph W. Ueberhuber
چکیده

This paper presents an abstraction layer for short vector SIMD ISA extensions like Intel’s SSE, AMD’s 3DNow!, Motorola’s AltiVec, and IBM’s Double Hummer. It provides unified access to short vector instructions via intermediate level building blocks. These primitives are C macros that allow, for instance, portable and highly efficient implementations of discrete linear transforms like FFTs and DCTs. The newly developed API is built on top of recently introduced C language extensions that allow to access short vector SIMD hardware from within C programs by means of data types and intrinsic or built-in functions. Empirical evidence of the success of the portable SIMD API is provided by means of program runs of short vector versions of Spiral and Fftw, resulting in the fastest FFT implementations to date.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

2-D Wavelet Transform Enhancement on General- Purpose Microprocessors: Memory Hierarchy and SIMD Parallelism Exploitation1

This paper addresses the implementation of a 2-D Discrete Wavelet Transform on general-purpose microprocessors, focusing on both memory hierarchy and SIMD parallelization issues. Both topics are somewhat related, since SIMD extensions are only useful if the memory hierarchy is efficiently exploited. In this work, locality has been significantly improved by means of a novel approach called pipel...

متن کامل

Instruction Set Architecture Abstraction

This technical report describes CHERI ISAv3, the third version of the This report describes the CHERI Instruction-Set Architecture (ISA) and design. The purpose of this tutorial was to introduce the computer architecture Pydgin is a framework for rapidly developing instruction-set simulators (ISSs) from a but is particularly well-suited for exploring the hardware/software abstraction. The Intel...

متن کامل

SIMD code generation in data-parallel programming

Today’s desktop PCs feature a variety of parallel processing units. Developing applications that exploit this parallelism is a demanding task, and a programmer has to obtain detailed knowledge about the hardware for efficient implementation. CGiS is a data-parallel programming language providing a unified abstraction for two parallel processing units: graphics processing units (GPUs) and the ve...

متن کامل

Hardware Abstraction Layer (HAL)

ION LAYER FOR IMPLEMENTATION OF EXTENSIONS IN PROGRAMMABLE NETWORKS Hardware Abstraction Layer (HAL)

متن کامل

Chemical Kinetics on Multi-core SIMD Architectures

Chemical kinetics modeling accounts for a significant portion of the computational time of atmospheric models. Effective application of multiple levels of heterogeneous parallelism can significantly reduce computational time, but implementation on emerging multi-core technologies can be prohibitively difficult. We introduce an approach for chemical kinetics modeling on multi-core SIMD architect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003